Building a Sense Tagged Corpus with Open Mind Word Expert

نویسندگان

  • Timothy Chklovski
  • Rada Mihalcea
چکیده

Open Mind Word Expert is an implemented active learning system for collecting word sense tagging from the general public over the Web. It is available at http://teach-computers.org. We expect the system to yield a large volume of high-quality training data at a much lower cost than the traditional method of hiring lexicographers. We thus propose a Senseval-3 lexical sample activity where the training data is collected via Open Mind Word Expert. If successful, the collection process can be extended to create the definitive corpus of word sense information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users' Help

Open Mind Word Expert is an implemented active learning system that aims to create large annotated corpora by tapping into the world’s vast pool of knowledge. It does this by relying on the vast number of Web users who contribute their knowledge to data annotation. Open Mind Word Expert focuses on building semantically annotated corpora, by collecting word sense tagging from the general public ...

متن کامل

LCC-WSD: System Description for English Coarse Grained All Words Task at SemEval 2007

This document describes the Word Sense Disambiguation system used by Language Computer Corporation at English Coarse Grained All Word Task at SemEval 2007. The system is based on two supervised machine learning algorithms: Maximum Entropy and Support Vector Machines. These algorithms were trained on a corpus created from SemCor, Senseval 2 and 3 all words and lexical sample corpora and Open Min...

متن کامل

Building A Training Corpus For Word Sense Disambiguation In English-To-Vietnamese Machine Translation

The most difficult task in machine translation is the elimination of ambiguity in human languages. A certain word in English as well as Vietnamese often has different meanings which depend on their syntactical position in the sentence and the actual context. In order to solve this ambiguation, formerly, people used to resort to many hand-coded rules. Nevertheless, manually building these rules ...

متن کامل

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to trai...

متن کامل

DutchSemCor: Building a semantically annotated corpus for Dutch

State of the art Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years, while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002